Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation
نویسندگان
چکیده
In this paper we investigate new approaches to dynamic-programming-based optimal control of continuous time-and-space systems. We use neural networks to approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation which is, in the deterministic case studied here, a rst-order, non-linear, partial di erential equation. We derive the gradient descent rule for integrating this equation inside the domain, given the conditions on the boundary. We apply this approach to the \Caron-the-hill" which is a two-dimensional highly non-linear control problem. We discuss the results obtained and point out a low quality of approximation of the value function and of the derived control. We attribute this bad approximation to the fact that the HJB equation has many generalized solutions (i.e. di erentiable almost everywhere) other than the value function, and our gradient descent method converges to one among these functions, thus possibly failing to nd the correct value function. We illustrate this limitation on a simple onedimensional control problem.
منابع مشابه
Gradient Descent Approaches to Neural - Net - BasedSolutions of the Hamilton - Jacobi -
In this paper we investigate new approaches to dynamic-programming-based optimal control of continuous time-and-space systems. We use neural networks to approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation which is, in the deterministic case studied here, a rst-order, non-linear, partial diierential equation. We derive the gradient descent rule for integrating this equation in...
متن کاملExtended Applicability of the Symplectic Pontryagin Method
Abstract. The Symplectic Pontryagin method was introduced in a previous paper. This work shows that this method is applicable under less restrictive assumptions. Existence of solutions to the Symplectic Pontryagin scheme are shown to exist without the previous assumption on a bounded gradient of the discrete dual variable. The convergence proof uses the representation of solutions to a Hamilton...
متن کاملA Study of Reinforcement Learningin the Continuous Case by the Means
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectationof the best future cumulativereinforcement. In the continuous case, the value function satisses a non-linear rst (or sec...
متن کاملNonlinear Optimal Control Techniques Applied to a Launch Vehicle Autopilot
This paper presents an application of the nonlinear optimal control techniques to the design of launch vehicle autopilots. The optimal control is given by the solution to the Hamilton-Jacobi-Bellman (HJB) equation, which in this case cannot be solved explicity. A method based upon Successive Galerkin Approximation (SGA), is used to obtain an approximate optimal solution. Simulation results invo...
متن کاملUsing Neural Networks for Fast Reachable Set Computations
To sidestep the curse of dimensionality when computing solutions to Hamilton-Jacobi-Bellman partial differential equations (HJB PDE), we propose an algorithm that leverages a neural network to approximate the value function. We show that our final approximation of the value function generates near optimal controls which are guaranteed to successfully drive the system to a target state. Our fram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999